Skip to content

Introspection#385

Open
frodesundby wants to merge 16 commits intomainfrom
introspection
Open

Introspection#385
frodesundby wants to merge 16 commits intomainfrom
introspection

Conversation

@frodesundby
Copy link
Copy Markdown
Contributor

No description provided.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds “instance group” introspection to the GraphQL API so clients can inspect ReplicaSet-backed rollout revisions, including env vars, mounted files, instances (pods), and recent K8s events.

Changes:

  • Introduces instancegroup workload package (models, queries, event translation, dataloader + ReplicaSet watcher).
  • Extends GraphQL schema and resolvers to expose Application.instanceGroups and InstanceGroup details.
  • Adjusts Pod transformer to retain metadata.ownerReferences so pods can be matched to ReplicaSets.

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
internal/workload/instancegroup/queries.go Core listing + introspection logic for env vars, mounts, and instance (pod) membership
internal/workload/instancegroup/events.go Fetches and translates K8s events for ReplicaSets/pods into user-friendly messages
internal/workload/instancegroup/models.go Defines InstanceGroup and related GraphQL-facing models/enums
internal/workload/instancegroup/dataloader.go Adds loaders + ReplicaSet transformer + environment->dynamic client lookup
internal/workload/instancegroup/node.go Registers InstanceGroup as a Node ident type
internal/workload/dataloader.go Keeps Pod ownerReferences in cached pods to enable ReplicaSet ownership matching
internal/kubernetes/watchers/watchers.go Wires a new ReplicaSet watcher into the global watcher set
internal/graph/schema/instancegroup.graphqls Adds GraphQL types/fields for instance groups and related types
internal/graph/instancegroup.resolvers.go Implements resolvers for instanceGroups, env vars, mounts, instances, events
internal/cmd/api/http.go Injects instancegroup loader context into Graph request pipeline
.configs/gqlgen.yaml Adds instancegroup package to gqlgen autobind
internal/graph/gengql/applications.generated.go gqlgen output wiring Application.instanceGroups
internal/graph/gengql/instancegroup.generated.go gqlgen output for InstanceGroup types/resolvers
internal/graph/gengql/schema.generated.go gqlgen output: Node type switch + schema inclusion
internal/graph/gengql/root_.generated.go gqlgen output: resolver root + complexity hooks
internal/graph/gengql/teams.generated.go gqlgen output: Application field context updates
internal/graph/gengql/issues.generated.go gqlgen output: Application field context updates

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +97 to +104
case env.ValueFrom != nil && env.ValueFrom.ConfigMapKeyRef != nil:
value := env.Value // will be empty for valueFrom, but we try
ev.Value = &value
ev.Source = InstanceGroupValueSource{
Kind: InstanceGroupValueSourceKindConfigMap,
Name: env.ValueFrom.ConfigMapKeyRef.LocalObjectReference.Name,
}
case env.ValueFrom != nil && env.ValueFrom.FieldRef != nil:
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For env vars using ValueFrom.ConfigMapKeyRef, this sets Value to env.Value (which will be empty) and never resolves the referenced key from the ConfigMap. This contradicts the function doc that says ConfigMap variables include values. Suggest querying the ConfigMap and reading ConfigMapKeyRef.Key (honor Optional), or set Value=nil and make the doc accurate.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 64d1349ConfigMapKeyRef env vars now resolve their value via getConfigMapData() (K8s API call). Honors the Optional flag (debug log for optional, warn for required). Value is nil if the lookup fails.

Comment on lines +294 to +313
if mount.SubPath != "" {
// Fetch content for the specific key
contents, err := getConfigMapFileContents(ctx, l, ig.EnvironmentName, ig.TeamSlug.String(), cmName)
if err != nil {
l.log.WithError(err).WithField("configmap", cmName).Warn("failed to resolve configmap content for subPath mount")
errMsg := fmt.Sprintf("ConfigMap '%s' could not be found or accessed. The application may fail to start until this is resolved.", cmName)
return []*InstanceGroupMountedFile{{
Path: mount.MountPath,
Source: source,
Error: &errMsg,
}}
}
f := &InstanceGroupMountedFile{
Path: mount.MountPath,
Source: source,
}
if fc, ok := contents[mount.SubPath]; ok {
f.Content = &fc.content
f.IsBinary = fc.isBinary
}
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ConfigMap volume mounts with VolumeMount.SubPath are resolved using contents[mount.SubPath], but SubPath refers to the file path within the volume, not necessarily the ConfigMap key (it may match items[].Path when Items are specified). This can silently drop content for subPath mounts. Suggest mapping SubPath -> Key via the provided items (when present), otherwise fall back to using SubPath as the key.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 64d1349 — when SubPath is set and items are specified, we now map SubPath back to the actual ConfigMap key via the items list (item.Pathitem.Key). Without items, SubPath is used directly as the key (unchanged).


// Fall back to metadata.creationTimestamp
if ts, ok, _ := unstructured.NestedString(obj.Object, "metadata", "creationTimestamp"); ok && ts != "" {
if t, err := time.Parse(time.RFC3339, ts); err == nil {
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

metadata.creationTimestamp is parsed with time.RFC3339, but Kubernetes timestamps often include fractional seconds. time.Parse(time.RFC3339, ts) will fail in that case, causing extractEventTimestamp to return zero and dropping events. Suggest parsing with time.RFC3339Nano (which accepts both with/without fractions) or trying both formats.

Suggested change
if t, err := time.Parse(time.RFC3339, ts); err == nil {
if t, err := time.Parse(time.RFC3339Nano, ts); err == nil {

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 64d1349 — changed to time.RFC3339Nano which accepts both with and without fractional seconds.

Comment on lines +60 to +77
// Fetch events for each pod in the instance group
instances, err := ListInstances(ctx, ig)
if err != nil {
l.log.WithError(err).Warn("failed to list instances for event fetching")
} else {
for _, inst := range instances {
podEvents, err := fetchEvents(ctx, client, namespace, "Pod", inst.Name)
if err != nil {
l.log.WithError(err).WithField("pod", inst.Name).Warn("failed to fetch pod events")
continue
}
podName := inst.Name
for i := range podEvents {
if ev := translateEvent(&podEvents[i], &podName); ev != nil {
allEvents = append(allEvents, ev)
}
}
}
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ListEvents issues one Kubernetes List call per pod instance (plus one for the ReplicaSet). For workloads with many replicas this becomes N+1 API calls and can add significant latency/load to the API server when the GraphQL field is queried. Consider adding server-side limits (Limit/Continue, time window), batching via a single namespace event list + in-memory filtering, and/or caching events per request.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged — for typical workloads (2-5 pods) this is acceptable. Events are fetched on-demand and K8s events have ~1 hour TTL. Can be optimized later with namespace-scoped list + in-memory filtering if needed.

Comment on lines +57 to +158
// transformReplicaSet strips unnecessary fields from ReplicaSets but keeps
// the data we need: labels, annotations (revision), replicas, pod template
// (containers with env/envFrom/volumeMounts/image, volumes).
func transformReplicaSet(in any) (any, error) {
rs := in.(*unstructured.Unstructured)

// --- Extract data we want to keep ---

// metadata.annotations - keep only deployment.kubernetes.io/revision
annotations := rs.GetAnnotations()
newAnnotations := map[string]string{}
if v, ok := annotations["deployment.kubernetes.io/revision"]; ok {
newAnnotations["deployment.kubernetes.io/revision"] = v
}

// metadata.labels - keep app and team
labelsToKeep := []string{"app", "team"}
labels := rs.GetLabels()
for k := range labels {
if !slices.Contains(labelsToKeep, k) {
delete(labels, k)
}
}

// spec.replicas
replicas, _, _ := unstructured.NestedInt64(rs.Object, "spec", "replicas")

// spec.template.spec.containers (keep name, image, env, envFrom, volumeMounts)
containers, _, _ := unstructured.NestedSlice(rs.Object, "spec", "template", "spec", "containers")
newContainers := make([]any, 0, len(containers))
for _, container := range containers {
c, ok := container.(map[string]any)
if !ok {
return nil, fmt.Errorf("container is not a map[string]any")
}
newC := map[string]any{
"name": c["name"],
}
if img, ok := c["image"]; ok {
newC["image"] = img
}
if env, ok := c["env"]; ok {
newC["env"] = env
}
if envFrom, ok := c["envFrom"]; ok {
newC["envFrom"] = envFrom
}
if vm, ok := c["volumeMounts"]; ok {
newC["volumeMounts"] = vm
}
newContainers = append(newContainers, newC)
}

// spec.template.spec.volumes
volumes, _, _ := unstructured.NestedSlice(rs.Object, "spec", "template", "spec", "volumes")

// status.replicas and status.readyReplicas
statusReplicas, _, _ := unstructured.NestedInt64(rs.Object, "status", "replicas")
readyReplicas, _, _ := unstructured.NestedInt64(rs.Object, "status", "readyReplicas")

// spec.template.metadata.labels (for pod matching)
templateLabels, _, _ := unstructured.NestedStringMap(rs.Object, "spec", "template", "metadata", "labels")

// metadata.ownerReferences (to link to Deployment/Application)
ownerRefs, _, _ := unstructured.NestedSlice(rs.Object, "metadata", "ownerReferences")

// --- Remove everything ---
fieldsToRemove := [][]string{
{"spec"},
{"status"},
{"metadata", "managedFields"},
{"metadata", "generateName"},
}
for _, field := range fieldsToRemove {
unstructured.RemoveNestedField(rs.Object, field...)
}

// --- Add back only what we need ---
rs.SetLabels(labels)
rs.SetAnnotations(newAnnotations)

_ = unstructured.SetNestedField(rs.Object, replicas, "spec", "replicas")
_ = unstructured.SetNestedSlice(rs.Object, newContainers, "spec", "template", "spec", "containers")
if len(volumes) > 0 {
_ = unstructured.SetNestedSlice(rs.Object, volumes, "spec", "template", "spec", "volumes")
}
if len(templateLabels) > 0 {
templateLabelsAny := make(map[string]any, len(templateLabels))
for k, v := range templateLabels {
templateLabelsAny[k] = v
}
_ = unstructured.SetNestedMap(rs.Object, templateLabelsAny, "spec", "template", "metadata", "labels")
}
if len(ownerRefs) > 0 {
_ = unstructured.SetNestedSlice(rs.Object, ownerRefs, "metadata", "ownerReferences")
}

_ = unstructured.SetNestedField(rs.Object, statusReplicas, "status", "replicas")
_ = unstructured.SetNestedField(rs.Object, readyReplicas, "status", "readyReplicas")

return rs, nil
}
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new transformer is fairly involved and is expected to run in informer WatchList paths where transforms can be applied more than once. There are tests for transformer idempotency in other workload packages (e.g. secret). Consider adding unit tests verifying transformReplicaSet is idempotent and preserves the fields needed by queries (revision, replicas, containers env/envFrom/volumeMounts, volumes, ownerRefs).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged — transformer tests are on the backlog. Will add idempotency tests in a follow-up.

Lua integration tests covering:
- Application with instance groups (ReplicaSet filtering)
- Instance group with instances (Pod listing)
- Instance group with environment variables (env, configMapKeyRef, secretKeyRef, envFrom)
- Instance group with mounted files (ConfigMap/Secret volume mounts)

Also includes dataloader transformer idempotency unit test.
Include image name, container name, pod name, and rescale size
in translated event messages instead of dropping them. Also remove
nais.yaml tips from scheduling failure messages.
…ss, and last termination info

Extend ApplicationInstanceStatus with ready, lastExitReason, and lastExitCode
fields to give users actionable insight into instance health. Replace raw
Kubernetes reason strings with human-readable messages. Add TERMINATED state
for containers that exited successfully. Fix State() to use classifyWaiting
so benign startup states no longer incorrectly report Application as NOT_RUNNING.
…atterns

- Replace benignWaiting map[string]struct{} with slices.Contains
- Separate benign and problem message maps, remove redundant switch
- Replace sort.Slice with slices.SortFunc using time.Compare
- Replace sort.Strings with slices.Sorted(maps.Keys(...))
- Replace IsValid() switch statements with slices.Contains
@frodesundby frodesundby marked this pull request as ready for review April 14, 2026 13:58
@frodesundby frodesundby requested a review from a team as a code owner April 14, 2026 13:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants